{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data Requirements\n",
    "> Dataset input requirments"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example we will go through the dataset input requirements of the `core.NeuralForecast` class.\n",
    "\n",
    "The `core.NeuralForecast` methods operate as global models that receive a set of time series rather than single series. The class uses cross-learning technique to fit flexible-shared models such as neural networks improving its generalization capabilities as shown by the M4 international forecasting competition (Smyl 2019, Semenoglou 2021).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can run these experiments using GPU with Google Colab.\n",
    "\n",
    "<a href=\"https://colab.research.google.com/github/Nixtla/neuralforecast/blob/main/nbs/examples/Data_Format.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Long format"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Multiple time series"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Store your time series in a pandas dataframe in long format, that is, each row represents an observation for a specific series and timestamp. Let's see an example using the `datasetsforecast` library.\n",
    "\n",
    "`Y_df = pd.concat( [series1, series2, ...])`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture\n",
    "!pip install datasetsforecast"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from datasetsforecast.m3 import M3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Y_df, *_ = M3.load('./data', group='Yearly')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>unique_id</th>\n",
       "      <th>ds</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Y1</td>\n",
       "      <td>1975-12-31</td>\n",
       "      <td>940.66</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Y1</td>\n",
       "      <td>1976-12-31</td>\n",
       "      <td>1084.86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Y10</td>\n",
       "      <td>1975-12-31</td>\n",
       "      <td>2160.04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Y10</td>\n",
       "      <td>1976-12-31</td>\n",
       "      <td>2553.48</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>Y100</td>\n",
       "      <td>1975-12-31</td>\n",
       "      <td>1424.70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18260</th>\n",
       "      <td>Y97</td>\n",
       "      <td>1976-12-31</td>\n",
       "      <td>1618.91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18279</th>\n",
       "      <td>Y98</td>\n",
       "      <td>1975-12-31</td>\n",
       "      <td>1164.97</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18280</th>\n",
       "      <td>Y98</td>\n",
       "      <td>1976-12-31</td>\n",
       "      <td>1277.87</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18299</th>\n",
       "      <td>Y99</td>\n",
       "      <td>1975-12-31</td>\n",
       "      <td>1870.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18300</th>\n",
       "      <td>Y99</td>\n",
       "      <td>1976-12-31</td>\n",
       "      <td>1307.20</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1290 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      unique_id         ds        y\n",
       "0            Y1 1975-12-31   940.66\n",
       "1            Y1 1976-12-31  1084.86\n",
       "20          Y10 1975-12-31  2160.04\n",
       "21          Y10 1976-12-31  2553.48\n",
       "40         Y100 1975-12-31  1424.70\n",
       "...         ...        ...      ...\n",
       "18260       Y97 1976-12-31  1618.91\n",
       "18279       Y98 1975-12-31  1164.97\n",
       "18280       Y98 1976-12-31  1277.87\n",
       "18299       Y99 1975-12-31  1870.00\n",
       "18300       Y99 1976-12-31  1307.20\n",
       "\n",
       "[1290 rows x 3 columns]"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Y_df.groupby('unique_id').head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Y_df` is a dataframe with three columns: `unique_id` with a unique identifier for each time series, a column `ds` with the datestamp and a column `y` with the values of the series."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Single time series"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you have only one time series, you have to include the `unique_id` column. Consider, for example, the [AirPassengers](https://github.com/Nixtla/transfer-learning-time-series/blob/main/datasets/air_passengers.csv) dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>timestamp</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1949-01-01</td>\n",
       "      <td>112</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1949-02-01</td>\n",
       "      <td>118</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1949-03-01</td>\n",
       "      <td>132</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1949-04-01</td>\n",
       "      <td>129</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1949-05-01</td>\n",
       "      <td>121</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>139</th>\n",
       "      <td>1960-08-01</td>\n",
       "      <td>606</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>140</th>\n",
       "      <td>1960-09-01</td>\n",
       "      <td>508</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>141</th>\n",
       "      <td>1960-10-01</td>\n",
       "      <td>461</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>142</th>\n",
       "      <td>1960-11-01</td>\n",
       "      <td>390</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>143</th>\n",
       "      <td>1960-12-01</td>\n",
       "      <td>432</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>144 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      timestamp  value\n",
       "0    1949-01-01    112\n",
       "1    1949-02-01    118\n",
       "2    1949-03-01    132\n",
       "3    1949-04-01    129\n",
       "4    1949-05-01    121\n",
       "..          ...    ...\n",
       "139  1960-08-01    606\n",
       "140  1960-09-01    508\n",
       "141  1960-10-01    461\n",
       "142  1960-11-01    390\n",
       "143  1960-12-01    432\n",
       "\n",
       "[144 rows x 2 columns]"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Y_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')\n",
    "Y_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example `Y_df` only contains two columns: `timestamp`, and `value`. To use `NeuralForecast` we have to include the `unique_id` column and rename the previuos ones."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>unique_id</th>\n",
       "      <th>ds</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1949-01-01</td>\n",
       "      <td>112</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1949-02-01</td>\n",
       "      <td>118</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1949-03-01</td>\n",
       "      <td>132</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1949-04-01</td>\n",
       "      <td>129</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1949-05-01</td>\n",
       "      <td>121</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>139</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1960-08-01</td>\n",
       "      <td>606</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>140</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1960-09-01</td>\n",
       "      <td>508</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>141</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1960-10-01</td>\n",
       "      <td>461</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>142</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1960-11-01</td>\n",
       "      <td>390</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>143</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1960-12-01</td>\n",
       "      <td>432</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>144 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     unique_id          ds    y\n",
       "0          1.0  1949-01-01  112\n",
       "1          1.0  1949-02-01  118\n",
       "2          1.0  1949-03-01  132\n",
       "3          1.0  1949-04-01  129\n",
       "4          1.0  1949-05-01  121\n",
       "..         ...         ...  ...\n",
       "139        1.0  1960-08-01  606\n",
       "140        1.0  1960-09-01  508\n",
       "141        1.0  1960-10-01  461\n",
       "142        1.0  1960-11-01  390\n",
       "143        1.0  1960-12-01  432\n",
       "\n",
       "[144 rows x 3 columns]"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Y_df['unique_id'] = 1. # We can add an integer as identifier\n",
    "Y_df = Y_df.rename(columns={'timestamp': 'ds', 'value': 'y'})\n",
    "Y_df = Y_df[['unique_id', 'ds', 'y']]\n",
    "Y_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "- [Slawek Smyl. (2019). \"A hybrid method of exponential smoothing and recurrent networks for time series forecasting\". International\n",
    "Journal of Forecasting.](https://www.sciencedirect.com/science/article/pii/S0169207019301153)\n",
    "- [Artemios-Anargyros Semenoglou, Evangelos Spiliotis, Spyros Makridakis, and Vassilios Assimakopoulos. (2021). Investigating the accuracy of cross-learning time series forecasting methods\". International\n",
    "Journal of Forecasting.](https://www.sciencedirect.com/science/article/pii/S0169207020301850)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "python3",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
